Skip to main content

View Post [edit]

Poster: stbalbach Date: Nov 30, 2009 11:19am
Forum: texts Subject: PDF's on Amazon Kindle

I recently bought an Amazon Kindle DX for reading Internet Archive PDF's - however the Kindle doesn't support most IA's scans (incompatible image type). Of course it's possible to convert PDF -> Mobi using the program Calibre, however a lot is lost - pictures, page layout and numbers, marginalia, the font and original look of the book - in other words, everything that makes reading scanned books so much better than plain text. I found a workaround to display PDF on the Kindle. Basically it involves converting the DjVu version to PDF. This page explains: http://www.aeonity.com/david/how-convert-djvu-files-pdf-free-in-bulk The solution requires the commercial Pro version of Adobe. But in the comments section some users report it working with a freeware replacement "BullZip" http://www.bullzip.com/download.php ..or a program called Infranview, which has plugins making it a single-program solution. I have not tried these. http://www.irfanview.com/ Anyway, I now can view the entire IA library on my Amazon Kindle, in PDF format, and am very happy. Stephen
This post was modified by stbalbach on 2009-11-30 19:19:13

Reply [edit]

Poster: aibek Date: Dec 26, 2012 11:08pm
Forum: texts Subject: Re: PDF's on Amazon Kindle

Hello

Does this problem still exist with Kindle? If yes, are you sure it is due to incompatible image types? If not, please see below.

I have found that some pdf files produced by IA do not follow the PDF specification, and that causes them to fail with some programs (e.g., in pdflatex with the error ‘invalid font in reference type ’). I will write a post about this on the forum, and the solution, in due time, but for now I am looking for other instances of the failure.

Could the problem in Kindle be due to the same bug? Can you please check? The pdf file from the below mentioned page (Lyon’s Grammar) fails with pdflatex with the error mentioned below. If you are not sure about the source of the Kindle problem, can you please check (i) that it fails with Kindle too, and (ii) the solution I propose works.

(ii): To correct -- which is a hack (for now) -- replace:

00006 0 obj
<< /F2 << /Type /Font /Subtype /Type1 /Encoding << /Differences 5 0 R >> /BaseFont /Times-Roman >>
/F2B << /Type /Font /Subtype /Type1 /Encoding << /Differences 5 0 R >> /BaseFont /Times-Bold >>
/F2I << /Type /Font /Subtype /Type1 /Encoding << /Differences 5 0 R >> /BaseFont /Times-Italic >>
/F3 << /Type /Font /Subtype /Type1 /Encoding << /Differences 5 0 R >> /BaseFont /Courier >>
>>
endobj

with:

00006 0 obj
<<>>
endobj

in the pdf file with a hexeditor or a good text editor. (Essentially, just delete everything between the two outer-most angle marks.) This process corrupts the xref table, but pdf readers should be able to digest the file anyway. (perhaps with a warning saying that “xref table is corrupted”.)

I have attached two versions of a page of the book. (It is enough to check the error with one page.) p.30-orig.pdf is the original page. p.30-edited.pdf is the page with the above mentioned required edit. There are no viruses in the files. You could check by just using these two files (70 KB each).

Thanks for your help.

http://archive.org/details/analysissevenpa00unkngoog


Attachment: p.30-orig.pdf
Attachment: p.30-edited.pdf

Reply [edit]

Poster: stbalbach Date: Dec 28, 2012 11:40pm
Forum: texts Subject: Re: PDF's on Amazon Kindle

> Could the problem in Kindle be due to the same bug?

Recall it's because the Kindle DX doesn't support layered PDF documents.

I downloaded and opened the p.30's and they both display correctly, perhaps because the original is not layered.

Try this one: http://archive.org/details/devisesetembleme00lafeu

On downloading the PDF and opening in the DX it says "some elements on this page can not be displayed" and every page is blank (ie. 100% elements can't be displayed). If you want, make changes mentioned above and I'll try it again (I don't have the tools to edit a binary file).

Stephen

Reply [edit]

Poster: aibek Date: Dec 31, 2012 6:32am
Forum: texts Subject: Re: PDF's on Amazon Kindle

First about the file.

Every image that we see in this is made up from 3 images: a background, a foreground and a mask. So the pdf really contains not 116, but 348 images. The foreground and background are in RGB, the mask is BW (1 bit per sample).

To see why this is useful, imagine brown text on yellow page. If such a 1000x1000 sample image is saved in RGB (3 bytes per sample), it would before compression take 3,000,000 bytes. In the IA method the image is decomposed into 3 parts: a brown 1000x1000 image, a yellow 1000x1000 image, and a BW 1000x1000 mask, and each of them is compressed and saved separately in the PDF. It is the reader’s work to join them together. '0' in the mask image means the reader is to compose the final image by using the corresponding pixel of the foreground image; '1' means that the reader is to use the corresponding pixel of the background image.

This helps as the three can be compressed much better. At an extreme, you can always imagine the brown and the yellow images taking just a few bytes each: 3 bytes for recording the colour, and a few bytes more for recording the dimensions of the image. Also, our most critical data is in the mask -- that image has to be the sharpest. But that has now 1 bit samples, and not the 27 bytes we earlier had.

There are a few more details. First, the background image is saved at a lower resolution, and the reader is asked to interpolate. So a little, if insignificant, loss in quality is creeping in the IA "compression". Second, the background and the foreground images are not limited to one colour -- they can be full-fledged RGB images too. The point is that the background image will fill surrounding colour in the place where the (foreground) objects are -- those parts will not be used anyway -- and thus will have a more or less uniform colour or gradient. And similarly, the foreground image will fill the nearby colours in the place in which it will not be read. This way the compression is much better. The critical stuff is (i) identifying the foreground and background properly, (ii) filling colours in areas which will not be read so that the image can be compressed best by the compression method of choice ('JPXDecode' for the RGB images in IA files). IA’s PDF files are produced by 'LuraDocument PDF v2.28'.

The compression is pretty significant. In the attached page 2 of the Devises et Emblemes file, the three images together take 42 KB. The dimensions of the composed image, however, are 2201x3063, so RGB (3 bytes per sample) would take 19 MB! This is the size of the file you will get when you join the three together in the intended manner.

So, most likely, Kindle is refusing to do this work. Please check for the text-layer issue too. The attached orig-p.2.pdf is the p. 2 of the book. The minus-textlayer.pdf is that page minus the text layer. (Both the files are of 40 KB size.) I am assuming that your Kindle can read neither of the files.

Reply [edit]

Poster: aibek Date: Dec 31, 2012 6:38am
Forum: texts Subject: Re: PDF's on Amazon Kindle

I forgot to add the files in the previous post.

Attachment: orig-p.2.pdf
Attachment: minus-textlayer.pdf

Reply [edit]

Poster: stbalbach Date: Jan 1, 2013 2:19pm
Forum: texts Subject: Re: PDF's on Amazon Kindle

I tried both and same error about unable to display elements and shows a blank page. You are probably correct the problem is these readers don't have the CPU for the compression and/or memory.

Reply [edit]

Poster: aibek Date: Jan 5, 2013 7:52pm
Forum: texts Subject: Re: PDF's on Amazon Kindle

Btw, the pdf reader is supposed to paint two images, one on the top of another: the background image in the background, and the foreground+mask image (i.e. a cut-out of the foregound image) over it. So it is really due to the presence of layers of images as you initially suspected.

Reply [edit]

Poster: aibek Date: Dec 29, 2012 9:44pm
Forum: texts Subject: Re: PDF's on Amazon Kindle

Since you can open the p.30-orig file in Kindle, Kindle does not share the pdftex/pdflatex problem which I was talking about.

Let me investigate the properties of the Devices et Emblems file.

Reply [edit]

Poster: aibek Date: Dec 26, 2012 11:57pm
Forum: texts Subject: Re: PDF's on Amazon Kindle

For completion, the p.30-edited above has had its xref errors + another inconsequential error fixed using pdftk. Another file with just the change I mentioned above (i.e., some stuff deleted) is attached with this mail, with the name p.30-edited-xref-notfixed.pdf.

Attachment: p.30-edited-xref-notfixed.pdf